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ABSTRACT: 

A per-flow queuing method and apparatus for IP networks carrying traffic from feedback 
controlled TCP connections enables flow of information packets from one or more sources to a 
destination through a link and comprises a buffer of predetermined size partitioned into a pluralit/ 
of queues, each queue being allocated an occupancy bi for receiving and temporarily storing 
packets of information; a scheduler for removing packets from each buffer according to a 
predetermined rate and transmitting the packets over a network; and a control device for 
determining availability of queues in the buffer capable of receiving the packet and inputting the 
packet into a queue if the queue is available, the control device further selecting a queue and 
releasing a packet from the selected queue to accommodate input of the received packet when 
the queue is not available. Increased fairness and packet throughput through the link is 
achieved when the queue for dropping a packet is selected in accordance with a longest queue 
first or random drop scheme and, when a drop from front strategy for ACK packets is employed. 
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(54) A method for supporting per-connection queuing for feedback-controlled traffic 



(57) A per-flow queuing method and apparatus tor 
IP networks carrying traffic from feedback controlled 
TCP connections enables flow of information packets 
from one or more sources to a destination through a link 
and comprises a buffer of predetermined size parti- 
tioned into a plurality of queues, each queue being allo- 
cated an occupancy bj for receiving and temporarily 
storing packets of information; a scheduler for removing 
packets from each buffer according to a predetermined 
rate and transmitting the packets over a network; and a 



control device for determining availability of queues in 
the buffer capable of receiving the packet and inputting 
the packet into a queue if the queue is available, the 
control device further selecting a queue and releasing 
a packet from the selected queue to accommodate input 
of the received packet when the queue is not available. 
Increased fairness and packet throughput through the 
link is achieved when the queue for dropping a packet 
is selected in accordance with a longest queue first or 
random drop scheme and, when a drop from front strat- 
egy for ACK packets is employed. 



FIG. 3A 
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queuing scheme in the context of controlling traffic In 
and improving performance of feedback-controlled TCP 
networks. 

Moreover, it would be highly desirable to implement 
a fair queuing scheme implementing a packet dropping s 
mechanism that enables fair throughputs for TCP con- 
nections. 

Summary of the Invention 

w 

The instant invention is a per-flow/connection, 
shared-buffer management scheme to be used in con- 
junction with fair queuing scheduler in a feedback-con- 
trolled TCP network, so as to achieve the goals for TCP 
such as: 1) alleviate the inherent unfairness of TCP to- '5 
wards connections with long round-trip times; 2) provide 
Isolation when connections using different TCP versions 
share a bottleneck link; 3) provide protection from more 
aggressive traffic sources, misbehaving users or from 
other TCP connections in the case of reverse path con- 20 
gestion; 4) alleviate the effects of ACK compression in 
the presence of two-way traffic; 5) prevent users expe- 
riencing ACK loss (which causes their traffic to be 
bursty) from significantly affecting other connections; 6) 
provide low latency to interactive connections which 2S 
share a bottleneck with "greedy" connections without re- 
ducing overall link utilization. 

More particularly, a shared buffer architecture is im- 
plemented with bandwidth reservation guarantees, rj for 
each connection. Given a connection that is fully using 30 
its buffer of rate r^ and a second connection of rate rg 
that is being underutilized (wasted), then if that first con- 
nection needs more bandwidth, it may borrow the buff- 
ering (bandwidth) from the rg connection in the shared 
buffer scheme. When the second connection needs to 3S 
reclaim its buffer space, then data from another utilized 
buffer needs to be pushed out to make room for the* in- 
coming data packets. The per-connection queue nnan- 
agement scheme supports a packet dropping mecha- 
nism, such as longest queue first ("LQF"), in shared buff- ^ 
er architecture to result in improved TCP performance 
than FIFO-RED buffer management schemes. A fair- 
ness measure is used by comparing the ratio of the 
standard deviation to mean of the individual throughputs 
as a fraction of the total integrated link capacity. ^ 

Brief Description of Drawings 

Figure 1 is a diagram illustrating a TCP network 
connection. so 

Figure 2 is a block diagram of a shared buffer archi- 
tecture for multiple TCP connections. 

Figures 3(a)-3(c) illustrate the methodology imple- 
mented for effecting buffer allocatbn and packet drop- 
ping schemes with Fig. 3(b) illustrating the LQF packet 55 
drop method and Fig. 3(c) illustrating the RND packet 
drop method. 

Figure 4 is a diagram illustrating the hardware as- 



sociated with each buffer queue. 

Figure 5 illustrates the simulation of a TCP/IP net- 
work having a router implementing fair queuing. 

Figure 6(a) illustrates the improved TCP perform- 
ance measured by throughput (Fig. 6(a)) and fairness 
coefficient (Fig. 6(b)) as a function of buffer size for 20 
TCP connections over a bottleneck link in the simulated 
network of Fig. 5. 

Detailed Description of the Invention 

Figure 2 illustrates the per-connection queue archi- 
tecture for the router 55 of a TCP network connection 
handling packet traffic originating from a variety of 
sources S^,..,S|. The network connection element in- 
cludes a global, shared buffer memory B partitioned to 
form a plurality of "^ queues 30a,. . , i, for connections with 
a single (bottleneck) link 55 and scheduler 75 servicing 
data packets on the link 55 at a rate C. Each buffer con- 
nection /has a nominal buffer allocation bj which is that 
connection /s guaranteed buffer size. In this architec- 
ture known as 'per-flow queuing' or 'per-connection 
queuing", fine grained dynamic classification of the ar- 
riving packets to be queued is required. A "soft state* 
approach is assumed to maintain the connection state 
which leads to a potentially very high number of presum- 
ably active connections where the large nDajority may 
actually not be active any more and the associated state 
in the network node is just waiting to be timed out, re- 
claimed or deleted by any other means of garbage col- 
lection. Therefore scalability and sharing are primary re- 
quirements for a per-connection server. All operations 
required are tmplementable with 0(1 ) complexity and no 
resources (buffer or bandwidth) is statically allocated to 
a given connection. 

In operation, the scheduler 75 services each indi- 
vidual queue / at a rate equal to rj, which may be equal 
for each queue or, in accordance with a predetermined 
weight. A particular queue (connection) that uses more 
bandwidth, e.g., queue 30a at rate r^, is likely to have a 
longer queue than the other queues. If ail the queue con- 
nections are fully utilizing their respective bandwidth al- 
locations, then queue 30a will most likely experience 
buffer overflow. If some of the allocated memory, e.g., 
queue 30c. is not fully utilized, then the fair-queuing 
scheme of the invention enables data packets arriving 
at queue 30b to utilize or borrow buffer space from the 
underutilized queue 30c, on an as-needed basis, and 
thus exceed the resen/ed allocation b| of buffer queue 
30a. If the second buffer receiving packets meant for the 
first high rate buffer queue 30a becomes full, then an- 
other underutilized buffer, e.g., queue 30i may lend buff- 
er space for new data packets destined for queue 30a. 

It should be understood that more than one queue 
at a time may experience buffer overflow and hence, 
may borrow from an underutilized buffer space. Thus, 
more than one queue may exceed its reserved alloca- 
tion bf. It should also be understood that the invention 
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as described is applicable in both the forward and re- 
verse connections of the TCP connected network and 
is equally applicable to both data and ACK traffic flow. 

When a connection / needs more than bj buffers, it 
Is allocated space from the available pool, provided of s 
course that the total occupancy Is less than B. 

Figure 3(a) illustrates the general flow diagram for 
implementing the per connection queue scheme of the 
invention. When a packet arrives that Is destined for a 
connection i as shown at step 201 , the first step 202 is io 
to check a counter containing the current remaining 
available space of the buffer B and determine whether 
there is enough remaining memory space in the buffer 
to ensure accomnnodation of the newly arrived packet. 
If there is enough remaining memory space in the buffer is 
to ensure accomnrKxJation of the newly arrived packet, 
then the process continues at step 208 to identify the 
connection / that the arrived packet bek>ngs to and, at 
step 211 , to store that packet in the queue correspond- 
ing to the connectbn. It should be understood that im- 20 
piicit in this scheme is that the arriving packet(s) have 
been properly classified and assigned to one of the 
queues. 

If it is determined at step 203 that there is not 
enough remaining memory space in the buffer to ensure 2S 
accommodation of the newly arrived packet, e.g., queue 
30j whose current occupancy qj is less than bj needs a 
buffer, then a pushout scheme is implemented at step 
225 to make room for the arriving packet(s). Specifically, 
depending upon the implementation of the TCP protocol 30 
invoked, two methods can be employed for choosing the 
queue from which the pushout is done: 

Specifically, as shown in Fig. 3(b), the pushout 
mechanism in a first embodiment is an LQF scheme that 
selects the queue that is borrowing the greatest amount 3S 
of memory reserved from another queue, i.e., the con- 
nection /such that (qj - bj) is the largest over all connec- 
tions. Thus, as shown at step 250, a determination is 
made as to the current buffer allocation of each queue 
in the buffer. Then, at step 260, the current queue length 40 
qi of each queue is obtained and at step 270 a compu- 
tation is made as to the difference qj - bj for each queue. 
Finally, at step 275, the queue having the largest differ- 
ence qi - bj is selected. Thus, the most deviation from 
its resen/ed allocation b) is the longest queue and hence, 
a packet will be dropped from that queue first in one to 
one correspondence with arriving packet as indicated at 
step 226 Fig. 3(a). 

It should be understood that skilled artisans nnay de- 
vise other algorithms for effecting longest queue first so 
pushout scheme discussed herein, and that the inven- 
tion is not restricted to the methodology depicted in Fig. 
3(b). For instance, a k)ngest delay first ("LDF") dropping 
mechanism can be implemented which is equal to the 
LQF scheme when the albcated service rates Tj are all ss 
equal because if queues are being served at the same 
rate the delay will be the same for each connection. 
Anak>gously, if the sen/ice rates are unequal, the delays 



would be different even if the queue lengths are the 
same, hus, the LQF scheme is a special instance of the 
LDF 

It is possible that the Longest Queue First scheme 
noay lead to excessive bursty losses when implemented 
in a system with many connections having one queue 
considerably longer than the second longest queue, i. 
e., two or more closely spaced packets are dropped con- 
secutively from the queue exceeding its allocation. For 
instance, performance of the connection implementing 
a TCP-Reno architecture will be adversely affected as 
TCP-Reno type implementations are known to behave 
badly in presence of bursty loss. Thus, in order to reduce 
the anrtount of bursty loss in the above LQF scheme is 
modified to employ a random generator that randomly 
picks from those buffers exceeding their respective al- 
totments. 

Specifically, in a second embodiment, each back- 
logged connection / has a nominal buffer alkx:ation of b{ 
= B/n; where n is the number of backlogged connec- 
tions. As illustrated in Fig. 3(c), step 255, the memory 
allocation bj for each queue is obtained. Then, at step 
265, the backlogged connections are grouped, e.g., into 
two subsets: those with occupancy qj greater than bj, 
and those with qj ^ bj. From the set of queues above 
their allocation, i.e., qj > bj. one is selected randomly as 
indicated at step 285 and a packet is dropped from the 
front as indicated at step 226. 

In an attempt to equalize buffer occupancy for dif- 
ferent connections and to provide optimal protection 
from connections overloading the system, the selected 
pushout scheme will drop packets from the front of the 
longest queue in a manner similar to schemes imple- 
mented for open-loop traffic such as described in L. 
Georgiadis, I. Cidon, R. Guerin, and A. Khamtsy, ■Opti- 
mal Buffer Sharing,' IEEE J. Select. Areas Commun., 
vol. 13, pp. 1229-1240. Sept. 1995. 

As shown in Fig, 4, from the hardware standpoint, 
counters 90a,.., 90i are shown associated with each 
queue 30a,..,30i with a control processor 92 pro- 
grammed to keep track of the occupancy qj of its respec- 
tive associated queue. Thus, in method steps 260 and 
265 in Figures 3(b) and 3(c), respectively, the current 
queue lengths are obtained, e.g., by polling, or, e.g., by 
locating from a table of registers such as table 99 in Fig. 
4, the register indicating the longest queue length. An- 
other counter 95 is shown to keep track of the total buffer 
occupancy B. Thus, when a packet arrives, the proces- 
sor 92 provides a check at step 203 (Fig. 3(a)), to de- 
termine the total memory available in the counter 95. 

To determine the longest queue, processor 92 nnay 
implement a sorted structure such that, at the time of 
each enqueue, dequeue, or drop operation, after the 
queue*s corresponding counter has been accordingly in- 
cremented or decremented, its queue occupancy value 
qj is compared to the current longest queue occupancy 
value so that the longest queue structure is always 
known. 
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A variation of these per-flow schemes can be effi- 
cient ty implemented by using a set of bins with expo- 
nentially increasing size. Whenever a queue is nrtodified 
(enqueue, dequeue or drop), it is moved to the appro- 
priate bin (which is either the same, one above or below 
the current bin), while the system is keeping track of the 
highest occupied bin. When the buffer is full any queue 
from the highest occupied bin is selected and its first 
packet dropped. If the buffer is measured in bytes, this 
operation may have to be repeated until enough space 
has been freed to accommodate the newly arrived pack- 
et due to variable packet sizes. To achieve true LQD, 
the queues in the highest occupied bin would either 
have to be maintained as a sorted list or searched for 
the longest queue every time. 

A simulation of the improved performance of the 
buffer management schemes of the invent bn compared 
to the FIFO-RED and FQ-RED schemes is now de- 
scribed. The simulatton system 119 is illustrated as 
shown in Figures with sources 1 01 a,,., 1 01 f having high- 
speed access paths to a router 120 implementing the 
per-connection flow buffer management scheme of the 
invention which is the sole bottleneck. The comparisons 
were done using a mix of TCP Tahoe and TCP Reno 
sources, bursty and greedy sources, one-way and two- 
way traffic sources with reverse path congestion, and 
widely differing round trip times. The access path delays 
are set over a wide range to model different round trip 
times and the destinations were assumed to ACK every 
packet. For one way traffic, ACKs are sent over a non- 
congested path. For two-way traffic, the return path is 
through the router and there may be queuing delays. In 
particular, when the router uses FIFO scheduling, ACKs 
and data packets are mixed in the queues. With fair 
queuing, ACKs are handled as separate flows. For 
asymmetric traffic, the bandwidth of the return link is re- 
duced from the destination to the router so that there is 
considerable reverse path congestion and ACK k^s. 

The implementations of TCP Tahoe and TCP Reno 
in the simulation system 119 are modeling the TCP flow 
and congestion control behavior of 4.3-Tahoe BSD and 
4.3-Reno BSD. respectively. The RED model is packet 
oriented and uses 25% of the buffer size as the minimum 
threshold and 75% as maximum threshold, queue 
weight being 0.002. 

Figures 6(a) and 6(b) illustrate the improved per- 
formance when fair-queuing -LQD and RND drop meth- 
ods are implemented in the simulated TCP network from 
the point of view of utilization and fairness, respectively . 
as compared with prior art FIFO-RED and LQF meth- 
ods. 

Particularly, Figure 6(a) illustrates the improved TCP 
performance measured of throughput (Fig. 6(a)) and 
fairness coefficient (Fig. 6(b)) as a function of buffer size 
for 20 TCP connections over an asymmetric bottleneck 
link with IOfVtbps/100 Kbps capacity (TCP Tahoe and 
Reno 20 ms - 160 ms round trip time) in the simulated 
network of Fig. 5. 



As can be seen, both FQ- and FIFO- RED policies, 
indicated by lines 137a and 138a respectively, have 
poorer throughput than the fair-queuing LQF and RND 
drop methods indk:ated by line 139 and 140 because 

s the ACKs corresponding to retransmitted packets are 
k>st 66% of the time for the simulated asymmetry value 
asymmetry value of three (note that this is not the band- 
width asymmetry). This results in a timeout in at least 
66% of TCP cycles greatly reducing throughput. Other 

10 timeouts happen because of multiple losses in the for- 
ward path and losses of retransmitted packets in the for- 
ward path. On the other hand, drop from front in the re- 
verse path eliminates these timeouts alnrx^st completely. 
Since timeouts are expensive, both RED schemes have 

IS poorer throughput than the other schemes including 
FIFO-LQD. 

Additionally, as shown in Fig. 6(b), it is shown that 
both FQ-RND and FQ-LQD wortc very well because they 
combine the advantages of per-flow queuing with the 

20 time-out elimination of drop-f rom-front. FQ-LQD has the 
further advantage in that it has a built-in bias against 
dropping retransmitted packets. This is because when 
the source detects a loss by receipt of the first duplicate 
ACK it stops sending packets. The retransmitted packet 

25 is sent only after the third duplicate ACK is received. 
During the intervening interval when the source is forced 
by TCP to be silent, the queue corresponding to the flow 
is drained at least at its minimum guaranteed rate and 
therefore it is less likely to be the longest queue when 

30 the retransmitted packet arrives. Hence, the inherent bi- 
as against dropping retransmitted packets. Though this 
bias is not limited to asymmetric networks, the bias is 
enhanced in asymmetric networks due to the slow re- 
verse channel dilating, by the asymmetry factor, the in- 

35 terval between receipt of the first and third duplicate 
ACKS. Since loss of retransmitted packets causes an 
expensive time-out. this bias improves the performance 
of FQ-LQD as indicated in Fig, 6(b) as line 141 . FQ^ND 
indicated as line 142 has this bias as well, though to a 

40 lesser degree. The reasoning is somewhat simitar to 
that for FQ-LQD: during the interval between receipt of 
the first and third duplicate ACKs the flow's queue drains 
at a rate equal to at least its guaranteed rate (since the 
source is silent) and the queue occupancy could fall be- 
bw the reservation parameter for that flow. In that case, 
when the retransmitted packet arrives the retransmitted 
packet is not lost even if the aggregate buffer is full. With 
these advantages, FQ-LQD and FQ-RND have the best 
performance. The foregoing merely illustrates the prin- 

so ciples of the present invention. Those skilled in the art 
will be able to devise various modifications, which al- 
though not explicitly described or shown herein, embody 
the principles of the inventbn and are thus within its spir- 
it and scope. 
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Claims 

1. A method ot improving performance of TCP con- 
nections Including the steps of: 

5 

partitioning a buffer of predetermined size into 
a plurality of queues, each queue being allocat* 
ed an occupancy bj for receiving and temporar* 
Ity storing packets of information and being 
sen^iced by a scheduler for renrwving packets 
from each buffer and transmitting said packets 
over a said TCP connection; and, upon arrival 
of a packet, 

determining availability of queues for receiving 
said packet and inputting said packet into a 
queue if a said queue is available; and, 
if said queue is not available, selecting a queue 
and releasing a packet from said selected 
queue to accommodate input of said packet, 
wherein utilization of said connection is im- 20 
proved. 

2. A method of improving performance of TCP con- 
necttons as claimed in Claim 1 , further including the 
step of tracking the current length qj of each queue 25 
at a packet arrival or departure event. 

3. A method of improving performance of TCP con- 
nectbns as claimed in Claim 2, wherein the tracking 
step includes the step of incrementing a counter as- so 
sociated with a said queue each time a packet is 
input to sakJ queue or decrementing said counter 
when a packet is released from said queue. 

4. A method of improving performance of TCP con- 3S 
nections as claimed in Claim 2, wherein said step 

of selecting a queue includes: 

establishing current allocation bj for each 
queue; 40 
obtaining current queue length values qj of 
each said queue; 

computing difference between current queue 
length values qjand alkx:ated buffer occupancy 
b{ for each queue; and, ^5 
selecting said queue having the largest com- 
puted difference value. 

6. A method of improving performance of a TCP net- 
work connection as claimed in Claim 2, wherein 5o 
said step of selecting a queue Includes: 

establishing current allocation bj for each 
queue; 

computing a set of one or roote queues for 
which current queue length values qj exceed al- 
located buffer occupancy bj tor each queue; 
and 



selecting a queue randomly from saki set. 

6. A router for communicating packets of information 
from a plurality of sources to a single communica- 
tion link in a TCP/IP network, said router compris- 
ing: 

a buffer of predetermined size partitioned into 
a plurality of queues, each queue being allocat* 
ed an occupancy bj for receiving and temporar- 
ily storing packets of informatran; 
a scheduler for removing packets from each 
buffer and transmitting saki packets over said 
connection; 

control means for determining availability of 
queues in said buffer for inputting a received 
packet into a queue if a said queue of sakJ buff* 
er is available, and further selecting a queue 
and enabling said scheduler to release a packet 
from said selected queue to accommodate in- 
put of said received packet when a said queue 
of said buffer Is not available. 

7. A router as claimed in Claim 6, further comprising 
means associated with each said queue for tracking 
current length qj of said queue each time a packet 
is Input to or released from said queue. 

8. A router as claimed in Claim 7, wherein said control 
means includes: 

means for obtaining current queue length val- 
ues qj of each said queue; and 
means for computing difference between cur- 
rent queue length values qj and allocated buffer 
occupancy bj for each queue, wherein said 
queue having the largest computed difference 
value Is selected. 

9. A per-flow queuing apparatus for IP networks car- 
rying traffic from feedback controlled TCP connec- 
tions enabling flow of Information packets from one 
or more sources to a destination through a link, said 
apparatus comprising: 

a buffer of predetermined size partitbned into 
a plurality of queues, each queue being allocat- 
ed an occupancy bj for receiving and temporar- 
ily storing packets of information; 
a scheduler for removing packets from each 
buffer according to a predetermined rate and 
transmitting said packets over said network; 
and 

control device for determining availability of 
queues In said buffer capable of receiving said 
received packet and inputting said packet Into 
a queue if a said queue is available, said control 
device further selecting a queue In accordance 



6 



11 EP 0 872 988 A2 

with a longest queue first scheme and dropping 
a packet from said selected queue to accom- 
nrK>date input of said received packet when a 
said queue is not available, whereby increased 
fairness and packet throughput through said s 
link is achieved. 

10. A per-flow queuing method for IP networks carrying 
traffic from feedback controlled TCP connections 
enabling flow of infonmation packets from one or io 
more sources to a destination through a link, said 
method comprising: 

providing a buffer of predetermined size parti- 
tbned into a plurality of queues, each queue i^ 
being allocated an occupancy bj for receiving 
and temporarily storing packets of information; 
provkiing a scheduler for removing packets 
from each buffer according to a predetermined 
rate and transmitting 20 

said packets over said network; and 

determining availabiitty of queues In said buffer 
capable of receiving said received packet and 2S 
inputting said packet into a queue if a said 
queue is available, said control device further 
selecting a queue in accordance with a random 
drop scheme and dropping a packet from said , 
selected queue to accommodate input of said 30 
received packet when a said queue is not avail- 
able, whereby increased fairness and packet 
throughput through said link is achieved. 

35 
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